Remove (almost) all objects from the R environment. This will help ensure the program runs without any conflicting objects that may be existing in the workspace.
rm(list = ls())
rmvn <- function(n, mu = 0, V = matrix(1)) {
p <- length(mu)
if (any(is.na(match(dim(V), p))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(n * p), ncol = p) %*% D + rep(mu, rep(n, p)))
}
set.seed(1)
n <- 100
coords <- cbind(runif(n, 0, 1), runif(n, 0, 1))
x <- cbind(1, rnorm(n))
B <- as.matrix(c(1, 5))
sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5
D <- as.matrix(dist(coords))
R <- exp(-phi * D)
w <- rmvn(1, rep(0, n), sigma.sq * R)
y <- rnorm(n, x %*% B + w, sqrt(tau.sq))
library(spNNGP)
n.samples <- 500
starting <- list(phi = phi, sigma.sq = 5, tau.sq = 1)
tuning <- list(phi = 0.5, sigma.sq = 0.5, tau.sq = 0.5)
priors <- list(phi.Unif = c(3/1, 3/0.01), sigma.sq.IG = c(2, 5), tau.sq.IG = c(2,
1))
cov.model <- "exponential"
m.s <- spNNGP(y ~ x - 1, coords = coords, starting = starting, method = "latent",
n.neighbors = 5, tuning = tuning, priors = priors, cov.model = cov.model, n.samples = n.samples,
return.neighbors = TRUE, n.omp.threads = 2)
Warning in spNNGP(y ~ x - 1, coords = coords, starting = starting, method =
"latent", : 'return.neighbors' is not an argument
----------------------------------------
Building the neighbor list
----------------------------------------
----------------------------------------
Building the neighbors of neighbors list
----------------------------------------
----------------------------------------
Model description
----------------------------------------
NNGP Latent model fit with 100 observations.
Number of covariates 2 (including intercept if specified).
Using the exponential spatial correlation model.
Using 5 nearest neighbors.
Number of MCMC samples 500.
Priors and hyperpriors:
beta flat.
sigma.sq IG hyperpriors shape=2.00000 and scale=5.00000
tau.sq IG hyperpriors shape=2.00000 and scale=1.00000
phi Unif hyperpriors a=3.00000 and b=300.00000
Source compiled with OpenMP support and model fit using 2 thread(s).
----------------------------------------
Sampling
----------------------------------------
Sampled: 100 of 500, 20.00%
Report interval Metrop. Acceptance rate: 58.00%
Overall Metrop. Acceptance rate: 58.00%
-------------------------------------------------
Sampled: 200 of 500, 40.00%
Report interval Metrop. Acceptance rate: 53.00%
Overall Metrop. Acceptance rate: 55.50%
-------------------------------------------------
Sampled: 300 of 500, 60.00%
Report interval Metrop. Acceptance rate: 52.00%
Overall Metrop. Acceptance rate: 54.33%
-------------------------------------------------
Sampled: 400 of 500, 80.00%
Report interval Metrop. Acceptance rate: 42.00%
Overall Metrop. Acceptance rate: 51.25%
-------------------------------------------------
Sampled: 500 of 500, 100.00%
Report interval Metrop. Acceptance rate: 55.00%
Overall Metrop. Acceptance rate: 52.00%
-------------------------------------------------
round(summary(m.s$p.beta.samples)$quantiles[, c(3, 1, 5)], 2)
50% 2.5% 97.5%
x1 2.11 1.4 2.79
x2 4.87 4.5 5.23
round(summary(m.s$p.theta.samples)$quantiles[, c(3, 1, 5)], 2)
50% 2.5% 97.5%
sigma.sq 3.95 2.17 6.12
tau.sq 0.92 0.26 2.38
phi 11.77 4.45 25.91
plot(w, apply(m.s$p.w.samples, 1, median), xlab = "True w", ylab = "Posterior median w")
# Harvard Forest canopy height analysis
Here we consider forest canopy height (m) data measured using the NASA Goddard’s LiDAR Hyperspectral and Thermal (G-LiHT) Airborne Imager over a subset of Harvard Forest Simes Tract, MA, collected in Summer 2012. This is a sampling LiDAR system that only records strips of canopy height across the landscape. We would like to use the Harvard Forest data to assess if the current density of LiDAR measurements can be reduced, which would allow for wider strips to be collected. Ultimately, interest is in creating wall-to-wall maps of forest canopy height with associated uncertainty.
Let’s load the necessary packages and canopy height data which are part of the spNNGP package. Here too, we subset the data and divide it into a model and testing set.
library(geoR)
library(raster)
library(leaflet)
CHM <- raster(paste0("HARV/CHM/HARV_chmCrop.tif"))
CHM <- as.data.frame(CHM, xy = TRUE)
CHM <- CHM[(CHM[, 3] > 0), ]
row.has.na <- apply(CHM, 1, function(x) {
any(is.na(x))
})
CHM <- CHM[(!row.has.na), ]
set.seed(1)
mod <- sample(1:nrow(CHM), 25000)
ho <- sample((1:nrow(CHM))[-mod], 10000)
CHM.mod <- CHM[mod, ]
CHM.ho <- CHM[ho, ]
Let’s again start with a leaflet basemap then overlay the canopy height data. Recall, leaflet
maps expect data to be in geographic coordinate system (i.e., longitude
and latitude), so we first need reproject the CHM data (just for
visualization purposes, we’ll fit the model using the projected
coordinates).
chm.r <- rasterFromXYZ(CHM)
proj4string(chm.r) <- "+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
chm.r.ll <- projectRaster(chm.r, crs = "+proj=longlat +datum=WGS84")
pal <- colorNumeric(rev(terrain.colors(50)), domain = values(chm.r.ll), na.color = "transparent")
base.map <- leaflet(width = "100%") %>%
addProviderTiles("Esri.WorldImagery", group = "Satellite") %>%
addProviderTiles("Esri.WorldShadedRelief", group = "Terrain")
base.map %>%
addRasterImage(chm.r.ll, colors = pal, opacity = 1, group = "Canopy height") %>%
addLegend("bottomright", pal = pal, values = values(chm.r.ll), opacity = 1, title = "<center>Canopy height (m)</center>") %>%
addLayersControl(baseGroup = c("Satellite", "Terrain"), overlayGroups = c("Canopy height"),
options = layersControlOptions(collapsed = FALSE))
Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
+b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
+wktext +no_defs +type=crs
Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition
Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
+b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
+wktext +no_defs +type=crs
Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition
Let’s try and fit a variogram to the data to get a sense of the spatial structure. These variog function calculates the Euclidean distance matrix to construct the empirical variogram. When is large this will you will likely run out of memory, so you might need to consider only a subset of your data.
sub <- 1:10000
# note, max intersite distance is ~1.5km
v <- variog(coords = CHM.mod[sub, 1:2], data = CHM.mod[sub, 3], uvec = (seq(0, 500,
length = 30)))
variog: computing omnidirectional variogram
plot(v, xlab = "Distance (m)")
Now let’s fit some spatial regression models using NNGP random effects.
n.samples <- 1000
starting <- list(phi = 3/50, sigma.sq = 15, tau.sq = 2.5)
tuning <- list(phi = 0.05, sigma.sq = 0.01, tau.sq = 0.01)
priors <- list(phi.Unif = c(3/1000, 3/10), sigma.sq.IG = c(2, 10), tau.sq.IG = c(2,
5))
cov.model <- "exponential"
## Response model
m.r <- spNNGP(CHM.mod[, 3] ~ 1, coords = CHM.mod[, 1:2], starting = starting, method = "response",
n.neighbors = 10, tuning = tuning, priors = priors, cov.model = cov.model, n.samples = n.samples,
n.omp.threads = 2, n.report = 100)
----------------------------------------
Building the neighbor list
----------------------------------------
----------------------------------------
Model description
----------------------------------------
NNGP Response model fit with 25000 observations.
Number of covariates 1 (including intercept if specified).
Using the exponential spatial correlation model.
Using 10 nearest neighbors.
Number of MCMC samples 1000.
Priors and hyperpriors:
beta flat.
sigma.sq IG hyperpriors shape=2.00000 and scale=10.00000
tau.sq IG hyperpriors shape=2.00000 and scale=5.00000
phi Unif hyperpriors a=0.00300 and b=0.30000
Source compiled with OpenMP support and model fit using 2 thread(s).
----------------------------------------
Sampling
----------------------------------------
Sampled: 100 of 1000, 10.00%
Report interval Metrop. Acceptance rate: 38.00%
Overall Metrop. Acceptance rate: 38.00%
-------------------------------------------------
Sampled: 200 of 1000, 20.00%
Report interval Metrop. Acceptance rate: 42.00%
Overall Metrop. Acceptance rate: 40.00%
-------------------------------------------------
Sampled: 300 of 1000, 30.00%
Report interval Metrop. Acceptance rate: 44.00%
Overall Metrop. Acceptance rate: 41.33%
-------------------------------------------------
Sampled: 400 of 1000, 40.00%
Report interval Metrop. Acceptance rate: 33.00%
Overall Metrop. Acceptance rate: 39.25%
-------------------------------------------------
Sampled: 500 of 1000, 50.00%
Report interval Metrop. Acceptance rate: 40.00%
Overall Metrop. Acceptance rate: 39.40%
-------------------------------------------------
Sampled: 600 of 1000, 60.00%
Report interval Metrop. Acceptance rate: 31.00%
Overall Metrop. Acceptance rate: 38.00%
-------------------------------------------------
Sampled: 700 of 1000, 70.00%
Report interval Metrop. Acceptance rate: 43.00%
Overall Metrop. Acceptance rate: 38.71%
-------------------------------------------------
Sampled: 800 of 1000, 80.00%
Report interval Metrop. Acceptance rate: 39.00%
Overall Metrop. Acceptance rate: 38.75%
-------------------------------------------------
Sampled: 900 of 1000, 90.00%
Report interval Metrop. Acceptance rate: 30.00%
Overall Metrop. Acceptance rate: 37.78%
-------------------------------------------------
Sampled: 1000 of 1000, 100.00%
Report interval Metrop. Acceptance rate: 39.00%
Overall Metrop. Acceptance rate: 37.90%
-------------------------------------------------
round(summary(m.r$p.beta.samples)$quantiles[c(3, 1, 5)], 2)
50% 2.5% 97.5%
18.55 18.33 18.81
round(summary(m.r$p.theta.samples)$quantiles[, c(3, 1, 5)], 3)
50% 2.5% 97.5%
sigma.sq 15.049 13.976 15.851
tau.sq 1.482 1.244 2.470
phi 0.053 0.042 0.056
plot(m.r$p.beta.samples)
plot(m.r$p.theta.samples)
m.r$run.time
user system elapsed
359.669 1.154 199.089
Now prediction for the holdout set.
burn.in <- floor(0.5 * n.samples)
X.pred <- as.matrix(rep(1, nrow(CHM.ho)), ncol = 1, byrow = T)
pred.coords.x <- as.numeric(CHM.ho$x)
pred.coords.y <- as.numeric(CHM.ho$y)
pred.coords <- cbind(pred.coords.x, pred.coords.y)
p.r <- predict(m.r, X.0 = X.pred, coords.0 = pred.coords, sub.sample = list(start = burn.in,
end = n.samples, thin = 2), n.omp.threads = 2)
----------------------------------------
Prediction description
----------------------------------------
NNGP Response model fit with 25000 observations.
Number of covariates 1 (including intercept if specified).
Using the exponential spatial correlation model.
Using 10 nearest neighbors.
Number of MCMC samples 251.
Predicting at 10000 locations.
Source compiled with OpenMP support and model fit using 2 threads.
-------------------------------------------------
Predicting
-------------------------------------------------
Location: 100 of 10000, 1.00%
Location: 200 of 10000, 2.00%
Location: 300 of 10000, 3.00%
Location: 400 of 10000, 4.00%
Location: 500 of 10000, 5.00%
Location: 600 of 10000, 6.00%
Location: 700 of 10000, 7.00%
Location: 800 of 10000, 8.00%
Location: 900 of 10000, 9.00%
Location: 1000 of 10000, 10.00%
Location: 1100 of 10000, 11.00%
Location: 1200 of 10000, 12.00%
Location: 1300 of 10000, 13.00%
Location: 1400 of 10000, 14.00%
Location: 1500 of 10000, 15.00%
Location: 1600 of 10000, 16.00%
Location: 1700 of 10000, 17.00%
Location: 1800 of 10000, 18.00%
Location: 1900 of 10000, 19.00%
Location: 2000 of 10000, 20.00%
Location: 2100 of 10000, 21.00%
Location: 2200 of 10000, 22.00%
Location: 2300 of 10000, 23.00%
Location: 2400 of 10000, 24.00%
Location: 2500 of 10000, 25.00%
Location: 2600 of 10000, 26.00%
Location: 2700 of 10000, 27.00%
Location: 2800 of 10000, 28.00%
Location: 2900 of 10000, 29.00%
Location: 3000 of 10000, 30.00%
Location: 3100 of 10000, 31.00%
Location: 3200 of 10000, 32.00%
Location: 3300 of 10000, 33.00%
Location: 3400 of 10000, 34.00%
Location: 3500 of 10000, 35.00%
Location: 3600 of 10000, 36.00%
Location: 3700 of 10000, 37.00%
Location: 3800 of 10000, 38.00%
Location: 3900 of 10000, 39.00%
Location: 4000 of 10000, 40.00%
Location: 4100 of 10000, 41.00%
Location: 4200 of 10000, 42.00%
Location: 4300 of 10000, 43.00%
Location: 4400 of 10000, 44.00%
Location: 4500 of 10000, 45.00%
Location: 4600 of 10000, 46.00%
Location: 4700 of 10000, 47.00%
Location: 4800 of 10000, 48.00%
Location: 4900 of 10000, 49.00%
Location: 5000 of 10000, 50.00%
Location: 5100 of 10000, 51.00%
Location: 5200 of 10000, 52.00%
Location: 5300 of 10000, 53.00%
Location: 5400 of 10000, 54.00%
Location: 5500 of 10000, 55.00%
Location: 5600 of 10000, 56.00%
Location: 5700 of 10000, 57.00%
Location: 5800 of 10000, 58.00%
Location: 5900 of 10000, 59.00%
Location: 6000 of 10000, 60.00%
Location: 6100 of 10000, 61.00%
Location: 6200 of 10000, 62.00%
Location: 6300 of 10000, 63.00%
Location: 6400 of 10000, 64.00%
Location: 6500 of 10000, 65.00%
Location: 6600 of 10000, 66.00%
Location: 6700 of 10000, 67.00%
Location: 6800 of 10000, 68.00%
Location: 6900 of 10000, 69.00%
Location: 7000 of 10000, 70.00%
Location: 7100 of 10000, 71.00%
Location: 7200 of 10000, 72.00%
Location: 7300 of 10000, 73.00%
Location: 7400 of 10000, 74.00%
Location: 7500 of 10000, 75.00%
Location: 7600 of 10000, 76.00%
Location: 7700 of 10000, 77.00%
Location: 7800 of 10000, 78.00%
Location: 7900 of 10000, 79.00%
Location: 8000 of 10000, 80.00%
Location: 8100 of 10000, 81.00%
Location: 8200 of 10000, 82.00%
Location: 8300 of 10000, 83.00%
Location: 8400 of 10000, 84.00%
Location: 8500 of 10000, 85.00%
Location: 8600 of 10000, 86.00%
Location: 8700 of 10000, 87.00%
Location: 8800 of 10000, 88.00%
Location: 8900 of 10000, 89.00%
Location: 9000 of 10000, 90.00%
Location: 9100 of 10000, 91.00%
Location: 9200 of 10000, 92.00%
Location: 9300 of 10000, 93.00%
Location: 9400 of 10000, 94.00%
Location: 9500 of 10000, 95.00%
Location: 9600 of 10000, 96.00%
Location: 9700 of 10000, 97.00%
Location: 9800 of 10000, 98.00%
Location: 9900 of 10000, 99.00%
Location: 10000 of 10000, 100.00%
y.hat.r <- apply(p.r$p.y.0, 1, mean)
plot(CHM.ho[, 3], y.hat.r, main = "Response NNGP model", xlab = "True canopy height",
ylab = "Posterior predictive distribution mean")